The Integration of a Part-of-Speech Tagger into the ALEP Platform

نویسندگان

  • Thierry Declerck
  • Heinz Dieter Maas
چکیده

We describe how part-of-speech information delivered by a tagger (the mpro tool) has been integrated into the alep (Advanced Language Engineering Platform) system. For this we extended an approach described within the ls-gram project, which consisted in de ning the Text Handling component of alep in such a way that so-called \messy details" are handled within this subsystem, hence keeping the (linguistic) parser free from such tasks. We just extended the tagging strategy used for this purpose to normal words and modi ed the default tagging of words proposed by the alep system in order to incorporate informationdelivered by the part-of-speech tagger. The resulting tagging is converted by means of \lift" rules into partial linguistic descriptions, which provide the direct input to the grammatical analysis. We show that this procedure substantially reduces the parse times of the system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Integration of a Part - of - Speech Taggerinto

We describe how part-of-speech information delivered by a tagger (the mpro tool) has been integrated into the alep (Advanced Language Engineering Platform) system. For this we extended an approach described within the ls-gram project, which consisted in deening the Text Handling component of alep in such a way that so-called \messy details" are handled within this subsystem, hence keeping the (...

متن کامل

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

Evaluating the Impact of External Lexical Resources into a CRF-based Multiword Segmenter and Part-of-Speech Tagger

Résumé This paper evaluates the impact of external lexical resources into a CRF-based joint Multiword Segmenter and Part-of-Speech Tagger. We especially show different ways of integrating lexicon-based features in the tagging model. We display an absolute gain of 0.5% in terms of f-measure. Moreover, we show that the integration of lexicon-based features significantly compensates the use of a s...

متن کامل

بررسی مقایسه‌ای تأثیر برچسب‌زنی مقولات دستوری بر تجزیه در پردازش خودکار زبان فارسی

In this paper, the role of Part-of-Speech (POS) tagging for parsing in automatic processing of the Persian language is studied. To this end, the impact of the quality of POS tagging as well as the impact of the quantity of information available in the POS tags on parsing are studied. To reach the goals, three parsing scenarios are proposed and compared. In the first scenario, the parser assigns...

متن کامل

A Light Sliding-Window Part-of-Speech Tagger for the Apertium Free/Open-Source Machine Translation Platform

This paper describes a free/open-source implementation of the light sliding-window (LSW) part-of-speech tagger for the Apertium free/open-source machine translation platform. Firstly, the mechanism and training process of the tagger are reviewed, and a new method for incorporating linguistic rules is proposed. Secondly, experiments are conducted to compare the performances of the tagger under d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997